home *** CD-ROM | disk | FTP | other *** search
- *====================================================================*
- *Everything you wanted to know about Forth (but were afraid to ask). *
- * Copyright (c) 1986, by Scott Ballantyne. *
- *====================================================================*
-
- This file contains a description of how a threaded-code Forth
- compiler works, with specific reference to Blazin' Forth. You don't need
- to know the stuff in this file, unless you are interested in the particulars
- of how Forth compilers work, or are interested in improving or changing one.
- Specifically, I wrote the following as an aid to people who might be
- trying to understand the source to Blazin' Forth, and it should be considered
- part of the documentation for the source files to the system.
-
- To understand this document, you will need a decent knowledge of
- Forth. An understanding of pointers won't hurt any either.
-
- You don't need to know machine language to understand the stuff in this file,
- but you will, of course, need it to understand the actual source.
-
- I have attempted to provide sample code in hi-level forth that illustrates
- the routines involved. This code is similar, but not exactly the same, as
- the actual machine level routines in the actual compiler. In particular,
- you should not expect these routines to actually work if you type them into
- a forth system in an attempt to build a "Forth in Forth". They are
- provided to add clarity, and that is their only function.
-
- ---------------------------------------------------------------------------
- ------------------------ What is a Virtual Machine? -----------------------
- ---------------------------------------------------------------------------
-
- A Virtual Machine is a creation, in software, of a piece of hardware.
- Note that this hardware does not actually have to exist - it is only within
- the last year that an actual hardware Forth computer has been built. Forth
- has been around a lot longer than that.
-
- All high level languages are essentially virtual machines, since they
- implement instructions which are not part of the actual hardware CPU.
- As an example, Forth uses two stacks, a parameter stack, and a return stack.
- On an actual hardware Forth computer, the built-in machine language of the
- computer would contain instructions for manipulating each stack, and the
- pointers to the bottom of each stack. On the 6502, there is actually only
- one stack - one or the other of the Forth stacks must be emulated using
- software routines. So you could say that one stack is a hardware stack,
- and the other stack is a Virtual stack.
-
- As another example, the function call mechanism (how the actual
- functions, procedures, or words are eventually caused to execute) of a
- high level language is rarely directly supported by hardware instructions.
- On the majority of CPU's in use on personal computers today, the only
- real function call mechanism implemented in the hardware is the
- subroutine call (usually referred to as a JSR or CALL instruction). This
- instruction usually only saves the return address automatically. Any other
- information or any other method of invoking a subroutine must be done by
- software that, essentially, is a software (or virtual) function call, as
- opposed to the hardware function call and return. This is particularly true
- of threaded code Forth, and the routine which implements this function call
- mechanism is called NEXT. Understanding NEXT is the key to understanding
- the functioning of the Forth compiler at its lowest level.
-
- --------------------------------------------------------------------------
- -------------------------- Introducing NEXT. -----------------------------
- --------------------------------------------------------------------------
-
- To understand how NEXT (and the various Forth machine registers that
- NEXT uses to do its thing) works, let's first take a quick look at the
- structure of a compiled higher level Forth word. It is essentially a list
- of addresses:
-
- : EXAMPLE W1 W2 W3 ;
-
- ( Standard Forth Header goes here)
- Address of W1
- Address of W2
- Address of W3
- Address of EXIT ( ; )
-
- NEXT uses several auxiliary registers to keep track of where the user
- program is. On a CPU with many registers, these would be kept in a selected
- CPU register. On the 6502, which has only 4 user accessible registers,
- these are maintained as virtual registers (page zero locations are used
- for greater speed). One of these virtual registers is called the
- Interpreter Pointer, or IP for short, and it is responsible for keeping
- track of the progress of the current program. When NEXT is entered, in the
- course of running a program, the IP will be pointed at the word we want
- to execute. NEXT does some stuff (to be described in a moment) to cause
- this word to begin to execute, but before transferring control to this
- word, it moves the IP ahead to the next words address, so it will know what
- word to run the next time it is called.
-
- Here is a sample execution of EXAMPLE, given above:
-
- IP --> Address of W1 ( NEXT executes W1, first moving IP ahead to W2)
- IP --> Address of W2 ( NEXT executes W2, first moving IP ahead to W3)
- IP --> Address of W3 ( NEXT executes W3, first moving IP ahead to EXIT)
- IP --> Address of EXIT
-
- I like to think of the IP as a kind of Address Slider, that can be moved
- ahead or behind to direct the flow of execution of the current program.
-
- Ultimately, of course, NEXT must cause machine language instructions to be
- executed, which essentially means changing the hardware program counter of
- the CPU to point to the appropriate batch of instructions to be executed.
- It does this using another virtual register called the Current Word Pointer,
- or W for short. To understand this portion of NEXT, we have to clear up
- exactly what we mean by "Address of Word" in the above discussion.
-
- The individual members of the list that makes up a Forth definitions
- executable body are the addresses of the code field in the header of the
- compiled word. (These "addresses of code fields" will be refered to as
- "the execution address" of a word in the rest of this document. When the
- term "code field" is used, the reference will be to the actual code field
- portion of a dictionary header. Or, at least, that is how I am going to
- try to use these terms.)
-
- This execution address (as you may recall) itself stores an address which
- points to machine level (assembly language) instructions.
- It is these instructions that NEXT causes the CPU to execute, by forcing
- the CPU's program counter to the address stored at the execution address of
- that word. So we have a couple of levels of indirection here:
-
- The IP points to a location which holds the execution address of a
- word.
- The execution address pointed to by the location pointed to by the IP
- points to executable machine language instructions.
-
- So the full story of NEXT is as follows:
- 1) NEXT retrieves the value stored at the address in the IP.
- 2) It saves this value (the execution address of a word)
- in W (the current word pointer).
- 3) It then moves the IP ahead to point at the next word to be
- executed.
- 4) Finally, it forces the hardware program counter to the value
- stored at the address in W, which causes machine level
- instructions to execute.
-
- Here is an example of the full execution of a forth word.
-
- Let's make up some example addresses for our example execution:
-
- Address Contents Description
- -----------------------------------------------------------------------
- $A000 [ $0600 ] W1's execution address is $A000, and contains $0600.
- $B000 [ $0700 ] W2's execution address is $B000, and contains $0700.
- $C000 [ $0800 ] W3's execution address is $C000, and contains $0800.
- $0900 [ $0880 ] EXIT's execution address is $0900, and contains $0880.
-
- And here is how the compiled EXAMPLE word from earlier looks - let's say
- that the body of example starts at $E000:
-
- Address Contents Desription
- ----------------------------------
- $E000 [ $A000 ] Compiled W1
- $E002 [ $B000 ] Compiled W2
- $E004 [ $C000 ] Compiled W3
- $E006 [ $0900 ] Compiled EXIT.
-
- So, at the entry to NEXT, the IP will contain $E000.
-
- NEXT fetches the address stored here, and stuffs it into W, so W will now
- contain $A000, which is the execution address of W1.
-
- NEXT now increments the IP to point at the next word, so the IP will contain
- $E002.
-
- Finally, NEXT forces the program counter of the CPU to the address stored
- in the address stored in W. So here's a quickie quiz - what will be the
- address in the hardware program counter?
-
- (Answer: $0600 - which is the address of the machine language code for W1).
-
- Here is a quick synopsis of the values stored in the IP, W, and hardware
- PC for the execution of EXAMPLE, given above. It might be a good idea
- to pause here, and try to run through the rest of the example on your own,
- to check your understanding (and the clarity of my explanation) of how
- NEXT functions.
-
- Word-to-Execute IP W IP-AT-EXIT PC
- ----------------------------------------------------------
- W1 $E000 $A000 $E002 $0600
- W2 $E002 $B000 $E004 $0700
- W3 $E004 $C000 $E006 $0800
- EXIT $E006 $0900 $E008 $0880
-
- As a final aid to understanding, here is an implementation of NEXT in
- hi-level Forth:
-
- : NEXT IP // Get address of IP
- @ // Get value of IP (address of next word to execute)
- @ // Get that words execution address
- W ! // And stuff into the current word pointer.
- 2 IP +! // Move IP along to next word, for next time.
- W @ // Get the execution address from W.
- @ // Get the actual address of the code.
- PC ! // Force into hardware PC, so that it will execute.
- ;
-
- Now that you understand NEXT (I hope), and the role of the Forth registers
- IP and W, you are in a good position to understand the rest of the
- Forth system.
-
- ---------------------------------------------------------------------------
- ---------------- EXECUTE - or how Forth launches programs -----------------
- ---------------------------------------------------------------------------
-
- You might be wondering at this point exactly how an application gets
- launched in the first place. Since NEXT uses the IP, and assumes that the
- IP is pointing at a compiled execution address, how do words that you just
- type in from the terminal get executed? Obviously, words typed directly
- to the interpreter from the terminal don't have an address which is valid
- for the IP.
-
- The answer is the Forth word EXECUTE, which takes an execution
- address as its argument. When you type a word to the interpreter that
- it can find in the dictionary, it pushes the execution address of the
- word onto the parameter stack, and calls EXECUTE. Execute first saves this
- execution address in W, and then forces the PC to the address stored in
- this execution address, just like the last part of NEXT. Here is EXECUTE
- in high level forth:
-
- : EXECUTE ( execution-address --- )
- W ! // save in W - then do last part of NEXT
- W @ @ // get the address of the code to execute
- PC ! // and execute it.
- ;
-
- Note that EXECUTE does not call NEXT - it assumes the EXECUTEd word
- will be doing that.
-
- At this point you are no doubt wondering how the IP gets initialized at all.
- It's not hard to understand, but let's put off a detailed discussion of it
- until we talk about the DOCOLON and EXIT routines, a little further on, but
- here is a brief hint: When EXECUTE executes your word, there is already a
- valid value in the IP - it is pointing somewhere inside INTERPRET. If the
- word you are executing is a colon definition, then the first thing it does
- is save the current value of the IP, and then changes it to point to itself.
- A CODE definition won't change the IP at all (unless the code you write is
- supposed to), and so the pointer to inside INTERPRET just hangs around until
- the code defintion gets to its NEXT call, which causes the INTERPRET word
- to resume. This will all become clearer when you understand exactly how
- DOCOLON and EXIT work.
-
- Incidentally, there is also an EXECUTE inside of the compiler loop - it's
- there to handle IMMEDIATE definitions - the ones that execute even when you
- are compiling. The logic here is the same as above. The only difference is
- that the IP will be pointing somewhere inside ] , instead of inside
- INTERPRET.
-
- ----------------------------------------------------------------------------
- ------------------------- How Forth Does Branching -------------------------
- ----------------------------------------------------------------------------
-
- In our discussion of NEXT, above, we only talked about sequential
- execution of words. What happens if we need to branch around words (as
- we do in conditionals like IF) or cause the same words to be executed
- repeatedly (as we do in DO LOOP or BEGIN UNTIL constructs)?
-
- The answer is actually very simple - we just change the IP to point
- to the word we want to branch to, and then execute NEXT. If you followed the
- above discussion on NEXT, it should be obvious that this causes a complete
- diversion of the flow of control for the current word.
-
- When a branch is compiled, two things are done: a special word that
- controls the branch is compiled, and the destination address of the branch
- is compiled. For example:
-
- : CR'S BEGIN CR AGAIN ;
-
- This word will just print newlines, until a rude action is taken by the
- operator to stop it. Here is how the compiled word looks in memory.
-
- (Standard Forth Header goes here)
- $A000 CR ( execution address of CR)
- $A002 BRANCH ( execution address of BRANCH)
- $A004 $A000 ( address to BRANCH to)
- $A006 EXIT ( execution address of EXIT)
-
- When CR'S executes, NEXT executes CR, and then it executes BRANCH.
- BRANCH takes the address immediately following it in memory, in this case
- $A000, and stuffs it into the IP. BRANCH then JMP's to NEXT. Since the
- IP is once again pointing at CR, (having been changed by BRANCH), NEXT
- once again executes CR, and then BRANCH, which causes the IP to be changed,
- and so on, forever.
-
- BRANCH is an example of an unconditional branching primitive - it always
- branches, no matter what. ?BRANCH is a conditional branching word - it
- will branch if the value on the top of the stack is FALSE - otherwise,
- no branch takes place. Here is an example of a word that would cause ?BRANCH
- to be compiled:
-
- : CR? ( BOOLEAN -- ) IF CR THEN ;
-
- CR? will obviously print a CR if the top of the stack is non-zero, otherwise,
- nothing happens. Here is how CR? would look in memory:
-
- (Standard Forth Header)
- $A000 ?BRANCH ( execution address of ?BRANCH)
- $A002 $A006 ( destination address of branch )
- $A004 CR ( execution address of CR)
- $A006 EXIT ( execution address of EXIT)
-
- In this case, the execution would execute ?BRANCH first, which tests the
- value of the top of the stack. Notice that two things can happen here,
- BOTH of which will change the IP:
-
- 1) If the top of the stack is FALSE, ?BRANCH will force the IP
- beyond the branch address, by adding two. This will obviously
- cause CR to be executed.
- 2) If the top of the stack is TRUE, ?BRANCH will act exactly like
- BRANCH, and stuff $A006 (the word immediately following ?BRANCH)
- into the IP, which will obviously just EXIT the definition.
-
- In any branching word, one or the other of these two things will happen.
- All of the branching words are compiled in exactly this way, with the
- branching primitive first, and the destination address of the branch
- immediately following it in memory. The reason that there are more
- branching primitives in Blazin' Forth than just these two has more to do with
- entry and exit conditions that it does with the actual branching mechanism.
-
- For example, IF-THEN, IF-ELSE-THEN, BEGIN-UNTIL, BEGIN-WHILE-REPEAT,
- BEGIN-AGAIN are all implemented with combinations of ?BRANCH and BRANCH,
- since all of these involve boolean testing of the top of the stack.
-
- Things like DO-LOOP and ?DO-LOOP and DO-+LOOP, etc. have additional things
- to do, like move the loop parameters to the return stack, add or subtract
- the loop index, test the loop index, and clean up the return stack on
- the loop exit. But the actual mechanics of branching are exactly the same,
- only the entry/exit conditions differ from word to word. Among other
- advantages, it makes the compiler code much simpler, since there are fewer
- 'special cases' to check for.
-
- Once again, as an aid to understanding, here are sample implementations of
- BRANCH and ?BRANCH in hi-level forth. As you read these, keep in mind that
- when BRANCH or ?BRANCH is executing, the IP will be pointing at the branch
- address - since it gets incremented before the execution of the next word
- by NEXT:
-
- : BRANCH ( branch unconditionally STACK: -- )
- IP @ // Get the value of IP, ordinarily the address of the code
- // field of the next word to execute. In this case, it is
- // a branch address.
- IP @ // Get the value stored at the address - which is the
- // destination branch value.
- IP ! // Change the IP to the destination address.
- NEXT // and execute.
- ;
-
- : ?BRANCH ( conditional branch STACK: BOOLEAN -- )
- 0= IF // test top of stack - if FALSE ( equal to zero)
- BRANCH // just execute BRANCH
- ELSE // value was TRUE, don't branch.
- 2 IP +! // move IP over branch address, to next word.
- THEN
- NEXT // and execute.
- ;
-
- ----------------------------------------------------------------------------
- ---------------- How Forth Does Nesting - DOCOLON and EXIT -----------------
- ----------------------------------------------------------------------------
-
- In the above examples there was never any question of remembering where we
- came from - the course of execution of the word was changed, and we never
- really cared to remember what called what. But what about having one colon
- definition calling another one? How does Forth remember where to come
- back to when it has finished the called definition?
-
- This is not particularly difficult either. Once again, the IP and W, the
- current word pointer, play central roles. In what follows, remember that
- W points to the actual address of the word we want to execute, while the IP
- points to a memory location which contains the address of the word.
-
- What happens is this:
- NEXT starts to execute a colon definition. All colon definitions have the
- same address stored at their execution address, which is the address of a
- machine language routine called DOCOLON or NEST. It is this routine that
- is responsible for saving the current execution environment.
-
- DOCOLON first pushes the current value of the IP (which holds the address
- of the word we want to return to) onto the return stack. At this point,
- W will be holding the execution address of the new word to execute. We want
- to execute the body of this word, so DOCOLON now adds two to the value in W,
- which makes it point to the BODY of this definition, and stuffs it into the
- IP. DOCOLON now calls NEXT, which causes the new word to execute.
-
- Eventually, NEXT will execute EXIT, which is the word compiled by ; .
- EXIT's job is to restore the previous execution environment, and it does
- this very simply by pulling the top of the return stack, and stuffing it
- into the IP. It then calls NEXT, which causes the calling word to resume
- execution as though nothing had happened.
-
- Here is an example:
-
- : FOOBAR CR ;
-
- : COLON-CALL FOOBAR ;
-
-
- Compiled view of the above:
-
- (Header for FOOBAR)
- $A000 DOCOLON ( Code field portion of header)
- $A002 CR ( Body )
- $A004 EXIT
-
- (Header for COLON-CALL)
- $B000 DOCOLON ( Code field portion of header)
- $B002 FOOBAR ( Body)
- $B004 EXIT
-
- And here is a simplified execution of COLON-CALL.
-
- Word-to-Execute IP W IP-AT-EXIT RETURN-STACK
- -------------------------------------------------------------------------
- FOOBAR $B002 $A000 $B004 XXXXX
- DOCOLON $B004 $A000 $A002 $B004
- CR $A002 CR's EA $A004 $B004
- EXIT $A004 EXIT EA $B004 XXXXX
- EXIT (in CALL-COLON) $B004 EXIT EA ----- ------
-
- (NOTE: EA stands for Execution Address.)
-
- Once again, here is a sample implementation in higher level forth, of
- DOCOLON and EXIT:
-
- : DOCOLON IP @ // get current value of IP
- >R // Save it on return stack
- W @ // Get execution address of current word.
- 2+ // Convert to Address of body.
- IP ! // Change IP
- NEXT // Execute new word
- ;
-
- : EXIT R< // Get old IP (saved by DOCOLON)
- IP ! // Restore
- NEXT // Resume execution.
- ;
-
- Since the most recent caller is always at the top of the return stack, the
- forth system can find its way through any number of levels of nesting, no
- matter how deep. There is no theoretical limit to the depth of nesting of
- forth words, although there is the practical limit of the size of the return
- stack.
-
- So how deeply can one nest definitions in Blazin' Forth? Well, the obvious
- answer is around 123 levels, since there is an entire page of memory
- allocated for the return stack. It is equally obvious that certain actions
- can modify this, such as pushing literals to the return stack in your
- definitions, or using DO LOOPS, since DO LOOPS store the loop control
- information on the return stack.
-
- Less obviously, you should note that CODE definitions do not cause the above
- nesting to occur. The majority of the primitives in Blazin' Forth are
- CODE definitions, and the desire to maximize the level of nesting was one
- of the design considerations that led to this decision.
-
- In practice, I have never even approached the theoretical maximum level for
- nesting, much less had a crash that was traceable to return stack overflow,
- even when using highly recursive words.
-
- ----------------------------------------------------------------------------
- ------------------------- Forth's DOES> construct --------------------------
- ----------------------------------------------------------------------------
-
- The implementation of the DOES> feature of Forth is usually one of
- the hardest for people to understand. The thing to remember when we get down
- to the actual details of the implementation is exactly how the current word
- pointer W works. When a word is executing, W will contain the execution
- address of that word. Stored at the address in W is the actual address of
- the code that is executing. In Forth, we would say that W @ is the execution
- address of the word, and W @ @ is the address of the code. Keeping this
- in mind will help you to understand what is going on.
-
- First, a quick refresher on what DOES> does. DOES> is possibly the
- most unique feature of forth, since it allows you to extend the actual
- Forth compiler to compile new types of words. DOES> words are compiler
- words, and as such, they are used to create new words to execute. To help
- keep the discussion clear, lets call words which contain DOES> parent words,
- and words which are created by DOES> words, child words.
-
- When a parent word executes, it creates a dictionary entry for the
- child. When the child executes, it leaves the address of its body on the
- parameter stack, and then executes the hi-level forth words after DOES> in
- the parent word. A common way to teach beginners about DOES> is to redefine
- one of the Forth primitives, such as CONSTANT, as a DOES> word. I'll do the
- same thing here, but I will also try to explain exactly how these words
- do their thing on an implementation level.
-
- : CONSTANT CREATE , DOES> @ ;
-
- Here we have our CONSTANT definition. When CONSTANT (the parent)
- executes, it will create a dictionary entry with a standard header (that's
- the function of the CREATE in our definition). It then allocates two bytes
- of parameter space, and compiles the value on the top of the stack into the
- dictionary (that's the function of the , in our definition). The words
- following the DOES> don't do anything when CONSTANT executes - they execute
- when the child word (the word created by CONSTANT) executes.
-
- When the child executes, it will leave the address of its body on
- the stack, and then the words following the DOES> will be executed. In this
- example, there is only the @ - which will replace the address of the BODY
- with the value stored there, just like CONSTANT should, and EXIT, which will
- return us to wherever we came from.
-
- Thus
-
- 10 CONSTANT TEN
-
- creates a dictionary entry for the name TEN, and a 2 byte parameter field for
- the value 10, which CONSTANT also stuffs there.
-
- Executing
-
- TEN
-
- will first leave the address of TEN's body on the stack, and then the words
- following DOES> (in the parent word CONSTANT) will execute, which result in
- the value 10 being left.
-
- Now for the implementation details:
-
- Here is how our definition of CONSTANT would look in the dictionary:
-
- (Preceeded, as always, with the standard forth header )
- $A000 DOCOL ( code field portion of header)
- $A002 CREATE ( execution address )
- $A004 , ( execution address )
- $A006 (;CODE) ( execution address )
- $A008 JMP DODOES ( actual machine language instructions)
- $A00B @ ( execution address)
- $A00D EXIT ( execution address)
-
- And here is how the definition of TEN would look:
-
- ( Standard dictionary header goes here)
- $B000 $A008 ( code field portion of header)
- $B002 10 ( value stored in parameter field)
-
- Ok, here is how it all sorts out. Remember that DOES> is defined as an
- IMMEDIATE word, and so it executes when you are compiling. The mysterious
- portion of the CONSTANT defintion, above - the (;CODE) and the JMP DODOES
- are written into the dictionary whenever DOES> executes.
-
- (;CODE) is an unusual primitive. When it executes, it overwrites the current
- contents of the code field of the last word added to the dictionary with
- the address of the machine code which follows it in the defintion currently
- executing. In our example above, it will cause all words created with
- CONSTANT to have a code field whose value is $A008 - the address of the
- JMP instruction in CONSTANT. This will obviously cause the JMP DODOES
- instruction to be executed each time a word created by CONSTANT is executed.
-
- DODOES is the routine that does the actual magic. It must do three things:
-
- 1) It must save the current value of the IP (just like DOCOLON) so
- Forth knows how to get back to the caller.
- 2) It must push the address of the child's body to the stack.
- 3) It must execute the words following the JMP DODOES in the parent.
-
- Using TEN as an example, DODOES must push the value $B002 to the parameter
- stack, and it must then cause the words starting at $A00B to be executed.
-
- Here is how it's done in Blazin' Forth:
-
- When TEN executes, it should be clear that the value stored in the current
- word pointer ( W ) is $B000, which is the execution address of TEN. The
- IP will be pointing somewhere important, so DODOES first saves it, which it
- does exactly like DOCOLON, by pushing it onto the return stack.
-
- Once the IP has been safely tucked away, we have two tasks to perform. We
- must push the address of the parameter field of TEN to the stack, and we
- must then cause the hi-level forth words in the DOES> part of CONSTANT to
- execute. We can use the value of W to do both these things.
-
- Remember that W, the current word pointer, is currently pointing at the
- execution address of TEN, and so contains $B000. So it is a simple matter to
- calculate the address of the body of 10 - we just add two to the current
- value of W (which gives us $B002), and push it to the parameter stack.
-
- Now, the value stored at $B000 is $A008, which is the address of the
- JMP DODOES instruction in CONSTANT. We want to execute the hi-level forth
- words beyond this instruction - a piece of cake. We simply add 3 (the size
- of an absolute JMP instruction on the 6502) to the value stored at the
- execution address of the child word TEN, and stuff it into the IP. Once this
- has been done, all we need to do is call NEXT, which takes care of everything
- else, since we just pointed the IP at the proper spot.
-
- Since we saved the previous value of the IP first off, when the EXIT
- at the end of the DOES> stuff is executed, we get returned to whatever called
- us.
-
- Once again, here is an example of DODOES in hi-level forth:
-
- : DODOES IP @ >R // Save current IP on return stack
- W @ 2+ // Leave address of parameter field on stack
- W @ @ // Get address of JMP DODOES instruction
- 3 + // Add in size of JMP absolute instruction
- IP ! // Set as new execution address
- NEXT // and execute it.
- ;
-
- That wasn't so hard, was it?
-
- ----------------------------------------------------------------------------
- ------------------- LITERALS, CONSTANTS, and VARIABLES ---------------------
- ----------------------------------------------------------------------------
-
- In this last section, I talk about how Blazin' Forth handles compiled
- literals, and how the words defined by CONSTANT, VARIABLE and USER are
- implemented.
-
- There are two kinds of literals recognized by Forth, numeric and string.
- Numeric literals are compiled automagically, by the compiler loop, while
- string literals are compiled by ." (usually).
-
- Numeric literals first. As you probably remember, when you compile a
- definition, Forth attempts to look up each word in the definition in the
- dictionary. If it finds the word, then it compiles the execution address
- of the word into the dictionary (unless the word is defined IMMEDIATE, of
- course!). If the word is not found, then it attempts to convert the string
- of characters you just fed it into a number. If it succeeds, then it
- compiles a special primitive called (LIT) into the dictionary, and
- immediately past that, it places the value of your literal. (If it can't
- convert it to a number, then it issues the famous "NOT IN CURRENT SEARCH
- ORDER" message.)
-
- Here is an example:
-
- : BIG 1000 ;
-
- and here is how it looks in the dictionary:
-
- (standard forth header)
- $A000 (LIT) ( start of BIG's BODY)
- $A002 1000 ( The value of your literal)
- $A004 EXIT ( EXIT - Tadah!)
-
- (LIT)'s function is to place the value following it in memory on the top
- of the parameter stack, and to move the IP over the literal value, to the
- next valid forth word. It's pretty simple in practice, if you remember
- that if (LIT) is being executed, the IP must be pointing at the address
- of the literal, since it was incremented by NEXT. Here it is as an
- example FORTH definition:
-
- : (LIT) ( -- 16bit)
- IP @ @ // Get value of literal to stack.
- 2 IP +! // Move IP past literal value, to next valid word.
- NEXT // and call NEXT
- ;
-
- Blazin' Forth has a memory saving feature for values that will fit in one
- byte. For these values another word, called CLIT is compiled, instead of
- (LIT). It works very similarly to (LIT):
-
- : CLIT ( --- byte-value)
- IP @ C@ // get the byte to the parameter stack
- 1 IP +! // move over byte literal to next valid word.
- NEXT // and execute next
- ;
-
- The case of string literals is very similar. ." is an immediate word which
- first compiles (.") . It then searches the input stream for an ending ", and
- moves everything before this final quote into the dictionary, with a leading
- count byte, as is normal for Forth. It also moves the pointers to the
- input stream past the string, so the interpreter won't try to evaluate it.
- Here is an example:
-
- : GREETING ." HELLO" ;
-
- And here is how GREETING would look in memory:
-
- (Header)
- $A000 (.") ( primitive to print the following inline string)
- $A002 5 ( the length of the string)
- $A003 H E L L O ( The characters are stored here, one per byte )
- $A008 EXIT
-
- The (.") primitive is one of the few low level words in Blazin' Forth that
- is actually written in Forth (i.e. it's a colon definition). Since (.")
- is a colon definition, this means that when (.") is called, DOCOLON will
- save the current value of the IP on the return stack. But, by a pretty
- stroke of fate, this will be exactly the address of the string following
- (."). To get a little more concrete about it:
-
- When Greeting executes, the IP will eventually contain the value $A000. This
- will cause NEXT to execute (."), but NEXT will first, as always, bump the
- IP to $A002 (the start of the inline string). When (.") executes, since it
- is a colon definition, DOCOLON will push $A002 (the current IP) to the
- return stack, and then enter the definition. So at entry, we have the
- address of the string on the return stack. All we have to do is retrieve the
- address, use COUNT and TYPE to display it, and adjust the return address
- on the stack before we exit. Once the return address has been adjusted and
- placed back on the stack, EXIT will return us to the word past the end of
- the inline string.
-
- Here is (."), just as it appears in the Blazin' Forth:
-
- : (.") ( --- )
- R@ ( get string address from return stack)
- COUNT ( get the count byte, adjust address )
- DUP 1+ ( total length of string, including count byte)
- R> + ( get address, move past end of string)
- >R ( and restore, for EXIT)
- TYPE ( the string)
- ; ( and return, using adjusted address as return)
-
- So much for literals.
-
- Constants and variables (including USER variables) run time action is
- determined by the routines pointed to by their code fields. There is no
- special primitives compiled, as there is with the literals.
-
- Here is a short run down of the actions of each:
- Constants place the value stored in their body on the parameter stack.
- Variables place the address of their body on the parameter stack.
- User variables place the address of the associated variable on the stack. The
- actual value stored in the parameter field is an offset from a base address.
-
- Armed with your present knowledge of the IP and W, understanding these
- definitions should be a snap. They all work very much the same. We start with
- variable, since it's the simplest.
-
- When the code field of a variable (or constant or USER) is executed, W
- will contain the execution address (the address of the code field) of the
- word in question. So it's easy: take the value in W, add two, and leave that
- value on the stack. Here is a hi-level definition of DOVARIABLE:
-
- : DOVARIABLE ( -- address)
- W @ // Get the execution address of this variable
- 2+ // Add two to get the body.
- NEXT // and that's it!
- ;
-
- Constants are very similar to variables - the only difference is the extra
- step required to retrieve the value in the constants body. Here is a hi-level
- definition of DOCONSTANT:
-
- : DOCONSTANT ( -- value)
- W @ 2+ // As in variable - get the address of the body.
- @ // Get the value stored there.
- NEXT
- ;
-
- USER variables are very similar to constants. The only addition here is that
- we add the base address of the user area to the value stored in the body of
- the user variable.
-
- : DOUSER ( -- address )
- W @ // get execution address of this user variable
- 2+ C@ // get offset - we only use byte offsets in Blazin' Forth.
- UP @ // get base address of user area
- + // add to offset to get actual address of variable
- NEXT
- ;
-
- Here is a question for those who want to test their general comprehension
- of the topics discussed here. Why can't we use the IP instead of W in the
- definitions of VARIABLE, CONSTANT, and USER ?
-
- Answer:
- Aside from making the definition more complex, it would be impossible to
- retrieve the addresses of variables, or the values of constants, when we
- are typing their names directly into the interpreter from the terminal!
- Remember that the interpreter launches programs by stuffing the
- execution address of a word into W. In the following situation, there is
- no way to get from the address of the IP to the address of the parameter
- field:
-
- VARIABLE BLETCH
- BLETCH . XXXX
-
- Since the IP is still pointing somewhere inside INTERPRET. The only pointer
- that is valid to code such as DOVARIABLE in all cases is W.